A FrameNet for Danish

نویسنده

  • Eckhard Bick
چکیده

This paper presents work on a comprehensive FrameNet for Danish (cf. www.framenet.dk), with over 12.000 frames, and an almost complete coverage of Danish verb lemmas. We discuss design principles and frame roles as well as the distinctional use of valency, syntactic function and semantic noun classes. By converting frame distinctors into Constraint Grammar rules, we were able to build a robust frame tagger for running Danish text, using DanGram parses as input. The combined context-informed coverage of the parser-frametagger was 94.3%, with an overall F-score for frame senses of 85.12. 1 The FrameNet concept Classification of the lexicon is central to many aspects of linguistic research, and modern computational linguistics in particular has a need for robust classification systems to support on the one hand automatic analysis, on the other hand applicational tasks such as information extraction and question answering. As the pivot of the sentence, verbs play a special, integrative role in lexical ontologies. While noun ontologies are relatively easy to build around ISA/hypernym-relations, verbs are somewhat harder to classify because structural aspects are meshed with semantics, with complex combinatorial restrictions residing in both a verb's meaning and its syntactic nature. While one of the largest ontological resources, WordNet (Fellbaum 1998), does cover verbs, but provides little structural-relational information, a number of other classification projects link verb classes to certain verbo-nominal combination patterns, providing information on the form, function and semantics of complements. For English, Levin's original verb classification (Levin 1993) has been expanded in the VerbNet project (Kipper et al. 2006) to include non-np complements and employs 23 (25) thematic roles and 94 semantic predicates. In the FrameNet project (Baker et al. 1998, Johnson & Fillmore 2000, Ruppenhofer et al. 2010), semantic frames like Commerce are drawn up with roles like Buyer, Seller, Goods and Money, which are then associated with verbs (or nouns and adjectives) from corpus examples. Since the same verb may appear in more than one frame, verb sense lists are created implicitly, with no guarantee for full coverage. Conversely, the PropBank Project (Palmer et al. 2005) departs from syntactically annotated corpus data to assign both roles and argument structure to each verb consecutively. Both FrameNet and PropBank provide morphosyntactic restrictions, while FrameNet also adds ontological information on slot fillers. For Danish, the target language of our own work, some semantic verb classification has been undertaken as part of the Danish DanNet project (Pedersen et al. 2008), covering ca. 3000 verbs with 6000 senses falling into 80 top classes, e.g. BoundedEvent + Physical + Location. However, while some incorporated adverbial material and reflexivity are provided as verb sense discriminators, no frame roles or systematic selection restrictions are listed. Earlier work comprises the STO database, with almost 6000 verbal entries of which 4/5 offer syntactic, and 1/5 semantic information (Braasch & Olsen 2004), and the Odense Valency Dictionary (Schösler & Kirchmeier-Andersen 1997), that classified verbal argument semantics through the semantics of pronoun complements, covering ca. 4000 verbs. The project described here, launched in 2006, also regards valency as a useful stepping stone towards the semantic classification of verb structures, assuming that almost all subsenses of a given verb can be distinguished, and a full thematic role frame assigned, if the form, function and (noun) semantics of complements are known. Thus, using the DanGram parser's valency dictionary (Bick 2001) as a point of departure, we manually assigned verb classes and thematic role frames to each valency “sense” of a given verb, using corpus data and dictionaries to check sense coverage, and adding sense-based subdistinctions for the broader valency senses where necessary. Syntactic functions and forms of complements were already implicit in the valency tags and could therefore be assigned semi-automatically. At the same time, our methodology of building semantic frames from “syntactic Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa (Eds.) NODALIDA 2011 Conference Proceedings, pp. 34–41 frames” considerably facilitated locating and checking corpus examples, since all syntactic complementation patterns were already available and searchable in corpora annotated with the DanGram parser (Bick 2001), allowing focused inspection of semantic variation. 2 The Danish FrameNet After 4 years, our framenet (inspection demo at www.framenet.dk) has a very good coverage for the DanGram lexicon, and while further senses and patterns are being added and existing ones revised, the overall number of lexemes is now fairly stable, at 6825, with an average of 1.77 frames and 1.46 senses per lexeme. At the time of writing, this corresponds to about 11.000 valency patterns and 12.075 different verb frames, roughly twice the volume of DanNet. We use 494 different verb categories (cp. Appendix 1) that are grouped using the original Levine senses and VerbNet numbering system, albeit with a modified naming system and expanded subclassification system. Thus, though syntactic alternations such as diathesis or word order are not considered frame-distinctors, we do deviate from WordNet and VerbNet by making a class distinction for polarity antonyms like increase decrease, like dislike, and for the self/other distinction (move_self, move_other). We also try to avoid large underspecified classes (e.g. change_of_state), while at the same time keeping the classification scheme as flat as possible, in order to facilitate the use of our categories as corpus annotation tags or Constraint Grammar disambiguation tags. We have therefore introduced classes like heat cool, activate deactivate or open close, reducing the larger change_of_state to a kind of wastebin rest category. 3 Frame role distinctors: valency, syntactic function and semantic classes The distinctional backbone of our frame invento1 A smaller set of 200 frame senses was also established, with a hypernym-mapping from the more fine-grained set, in part to allow some generalisation when used in e.g. syntactic disambiguation rules, in part to facilitate robust cross-language comparison and possibly transfer of frame types. 2 We wanted the class names to on the one hand be real verbs, on the other to reflect hypernym meanings wherever possible. Therefore, we avoided both examplebased names (common in VerbNet) and mostly abstrac concept names (commen in FrameNet) that are not verbs themselves. ry are syntactic valency frames like (monotransitive), (ditransitive), (prepositional ditransitive with the preposition “på” and a verb-incorporated 'ind'-adverb). Each of these valency frames is assigned at least one (or more) verb senses, each with its own semantic frame. Depending, for instance, on the number of obligatory arguments, several valency or semantic frames may share the same verb sense, but two different verb senses will almost always differ in at least one syntactic or semantic aspect of their argument frame guaranteeing that all senses can in principle be disambiguated exploiting a parser's argument tags and dependency links. For each of our 12.000 verb sense frames, we provide a list of arguments with the following information: 1. Thematic role (Table 1) 2. Syntactic function (Table 2) 3. Morphosyntactic form (Table 4) 4. for np's, a list of typical semantic prototypes to fill the slot (Table 3) 5. An English language gloss / skeleton sentence For about half the frames (46%), a best-guess link to a DanNet verb sense is also provided, based on semi-automatic matches on adverb incorporation and hypernym classification. Our FrameNet uses 38 thematic roles (or case/semantic roles, Fillmore 1968), leaving out adverbial roles that never occur as valencybound elements in a frame, but only in free adverbials (such as §COND for conditional subclauses). The 38 roles are far from evenly distributed in running text. Table 1 provides some live corpus data, showing that the top 5 roles account for 2/3 of all role taggings in running text. Thematic Role in corpus §TH Theme 31.75% §AG Agent 12.25% §ATR Attribute 12.25% §PAT Patient 5.12% §COG Cognizer 4.69% §SP Speaker 4.15% §RES Result 3.78% §LOC Location 2.95% §DES Destination 2.86% 3 In 780 cases multiple verb senses share the same valency frame in other words, in 6.5% of cases, verb senses cannot be disambiguated on syntactic function and form alone, but need help from semantic (noun) classes. 35 A FrameNet for Danish

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Ontologies for Semi-automatic Linking VerbaLex with FrameNet

This work presents a method of linking verbs and their valency frames in VerbaLex database developed at the Centre for NLP at the Faculty of Informatics Masaryk University to the frames in Berkeley FrameNet. While completely manual work may take a long time, the proposed semi-automatic approach requires a smaller amount of human effort to reach sufficient results. The method of linking VerbaLex...

متن کامل

Integrating Data from The Preposition Project into FrameNet

In the course of The Preposition Project, FrameNet sentences are used as instances to characteize preposition behavior. FrameNet sentences containing prepositional phrases beginning with a given preposition are presented to a lexicographer, whereupon the given preposition is tagged with a sense from a sense inventory derived from the Oxford Dictionary of English. For the 34 prepositions used in...

متن کامل

Building a computational lexicon and ontology with FrameNet

This paper explores FrameNet as a resource for building a lexicon for deep syntactic and semantic parsing with a practical multipledomain parser. The TRIPS parser is a wide-coverage parser which uses a domain-independent ontology to produce semantic interpretations in 5 different application domains. We show how semantic information from FrameNet can be useful for developing a domainindependent...

متن کامل

Korean FrameNet Expansion Based on Projection of Japanese FrameNet

FrameNet project has begun from Berkeley in 1997, and is now supported in several countries reflecting characteristics of each language. The work for generating Korean FrameNet was already done by converting annotated English sentences into Korean with trained translators. However, high cost of frame-preservation and error revision was a huge burden on further expansion of FrameNet. This study ...

متن کامل

Automatic Construction of an English-Chinese Bilingual FrameNet

We propose a method of automatically constructing an English-Chinese bilingual FrameNet where the English FrameNet lexical entries are linked to the appropriate Chinese word senses. This resource can be used in machine translation and cross-lingual IR systems. We coerce the English FrameNet into Chinese using a bilingual lexicon, frame context in FrameNet and taxonomy structure in HowNet. Our a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011